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Vision of an Open Library 



Two years ago many ol us gathered lo offer a vision ol an Open 
Library. We would work together lo offer Universal Access lo All 
Knowledge. To do this we would bring our libraries into the digital 
world by building and offering digital services in the tradition of our 
mslilulions 

And what a world it could be! Imagine a neighborhood library that 
has rn ill ions ol books on its virtual bookshelf so thatthe talented 
math whiz in a rural community can explore high math 
worth reading Irom all the ages. 

First we needed to build scanning cenlers m greal libraries, and 
crealedigilal services thai will delight and mlorm readers all over 
the world. 

Today I am happy to sland before you to say lhat libraries, en 
masse, are indeed taking up the challenge and going Open 

As we speak Ihere are 8 scanning cenlers scanning books. There 
are scanning cenlers scanning microfilm. There are libraries 
working together to build great colleclons. We are starling on Ihe 
palh of building great services leveraging Ihe massive colleclons 
that are now online. 

And we are doing il in the open. 

I would like to invite several ol the contributors to this effort lo step 
forward and speak briefly about some of the accomplishments and 
some of the near lerm plans. 
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Libraries Going Open 

Joining Ihe existing 50 contributors of the Open Content 
Alliance this year are. 



Boston College 
Boston Public Library 
Boston University 
Brandeis University 
Brown University 

Marine Biological Laboratory S Woods Hole 
ceanographic Institution 
Massachusetts Institution ol Technology 
Northeastern University 
State Library of Massachusetts 
Tufts University 
University ol Connecticut 
University ol Illinois at Urbana-Chanipaign 
University ol Massachusetts Amherst 
University ol Massachusetts Boston 
University ol Massachusetts Dartmouth 
University ol Massachusetts Lowell 
University ol Massachusetts Medical Center 
University ol Mew Hampshire 
University ol North Carolina, Chapel Hill 
Wei lesley College 
Williams Co liege 




Susan Wawrzaszek, Director of Libraries at Brandeis 
University, sitting at an Internet Archive Scribe book 
scanning machine 



Mass Scanning of Books 

The Scribe system has established lhal color book scanning al 
high quality can be achieved lor 10 cents a page when 
performed on a massive scale Already, 50 million pages ol 
200, &QO books have been scanned in B scanning cenlers m 3 
countries 

This last year, the InlernetArchive has worked to improve 
quality and efficiency by learning Irorn the lirst 2 years ol 
operation The rewritten software, Scribe?, is now more 
flexible to deal with different kinds of books, use less 
bandwidth and be able to use new cameras. 

As more scanning centers open this year, the locus will be on 
quality and flexibility. 




Poldout station 




t~" 



A foldoul map legated at the back of a book 



Fold-outs 



Fold-outs, or gateiolds, are found in uplo 10 percent of the 
books in somecollecions. Internet Archive has developed 
technology lo add scanning ol fold-ouls to its mass 
digitizalon workflow. 

This process uses a separate top down scanner wilhdigilal 
cameras (sim ilar lo those used on Ihe Scribe book scanner) 
to produce image liles thai can be merged back mlo the 
book so lhat the resulting derivative liles incorporate the 
lold-outs in nalural ways. 

This recently completed solution will be put into production 
in the Richmond and Toronto scanning centers before the 
end ol the year. 
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An example of a scanned microfilm book 



Mass Digitization of 

Microfilm 

Millions ol pages ol books, newspapers and |ournals have b&en 
microfilmed in Ihe past lew decades To capitalize on thai huge 
investment, the Internet Archive, working with the University ol 
Illinois and the Universily ol Chicago, has developed lools and 
techniques lor mass digitization of microfilm. 

In our pilot program, we digitized l&OO reels Irom Ihe University of 
Chicago brittle book collection (approximately 60QQ books) through 
a vendor, and Interne! Archive developed lools lor processing the 
oulput. This col leclonol brittle books was microfilmed in Ihe 1990s 
using NEH funding. 

To bring the costs down, the Internet Archive has now bought 
some o( the besl microfilm scanning machines, rattier lhan using 
vendors lor scanning services. 

At the University ofll lino is. Interne! Archive loaned a microfilm 
scanning machine in the library and the Universily staffed it to 
process approxirnalely 1000 books. The collection so lar focuses 
on railroad |ournals and loreign language monographs. 

For both proiects. the resulting digitized liles were processed lo 
separate the reels into individual items, MARC records were 
retrieved Irom Ihe library catalog, the image liles were cropped and 
de-skewed, run Ihrough optical characler recognition software lo 
produce three different OCR lorrnals, and made mlo 
downloadable, searchable PDFs and D|Vu files. All ol these files 
are stored on servers and backed up to secondary servers lo 
provide long term preservation and access. 

http ://w w w.arch i ve.org/d etails/microlilrn 




Andrew Carnegie 



New Mass Microfilm 
Digitization Programs 



In order to expand this proiecl to include more libraries, the 
tnternel Archive is launching a microfilm dig ilzalion program 
using I wo cost models 

Pay Per Page Model: Libraries may pay per page digitized, 
similar to ihecurrenl book digitization program. Funding may 
come Irorn the library itsell, granls, or loundations. 
Umversily ol Illinois and Ihe InlernetArchive, lor instance, 
are interested in handling other institutions 7 microfilm m a 
cost recovery model. 

Carnegie Model: For qualifying libraries, Ihe InlernetArchive 
will fund all equipment, hardware, software, processing and 
hoslmg costs, while Ihe library will supply Ihe microfilm and 
the labor to keep the microfilm scanning machine running lor 
2 shifts per day for? years. 

Please help build access lo the Public Domain by digitizing 
microfilm holdings. 
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A scanned journal details page.r table of conlents, and 
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Mass Digitization of 
Journal Literature 



Historical journal literature is generally bound containing 
several issues lo a volume, and each issue may incorporate 
multiple articles. While mass digitization of books has reduced 
the ccst ol page digitization and optical character recognition 
to 10 cents per page, the modern Internet user would like to 
find one article at a time within journal literature and be able to 
navigate based on authors, titles and embedded citations. 

Commercial and subscription services have typically used 
people to add this metadata to scanned icurnal literature. This 
is a fairly expensive technique compared to automated 
methods, but on the other hand, automated methods have 
errors. 

To solve this problem, Internet Arc hive has partnered with the 
Smithsonian Institution and Pennsylvania State University to 
build an automated solution for separating scanned volumes 
into individual articles and adding metadata lor author, title and 
page numbers. While much work remains to be done, ttie 
initial results are promising. 
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Biodiversity Heritage 
Library 



The E ncyc loped la of Life proiect strives to make a web page 
lor every species. On Ihis page will be genetic information, 
photos, sightings, and links lo every anicle thai mentions that 
species. 

The Biodiversity Heritage Library is Ihe scientific literature, 
pasiand present, dealing with species thai is a pan ol Ihe 
Encyclopedia of Life. The existing literature is mostly 
scientific |o urn als published over the lasl 200 years Much of 
this literature is cared for by the large natural history 
museums 

The libraries ol these museums are working together with the 
Internet Archive to digitize these |c urn als and work with 
purnal publishers lo lind a way through the copyright issues 
Legal advice is provided by the Electronic Frontier 
Foundation 




Boston Public Library 
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Loaning Out-of-Print 
Books 



Our libraries hold in -print, oul-ol-prmt, and oul-ol-copy right 
books Open Conlent Alliance libraries are scannmgout-of- 
copynghl works, while in -print books are starting lobe made 
available Irorn publishers' websiles and retailers. But to build a 
complete library we need the out-ol-prml works which may 
represenl as much as 50% ol our library collections 

Today we are announcing that several libraries will work 
togelher lo scan out-of-print books and offer Ihese to users 
through the mlerhbrary loan syslem. We believe Ihis can be a 
tremendously valuable way to increase scholarly and public 
access to hard-to-fmd resources. 

Out-of-prml books can repr&senl huge portions of library 
collections. Bernard Margohs of the Boslon Public Library 
estirnales thai several million of the eighl million volumes in the 
BPL collection are out of print. By scanning these volumes, 
libraries will be belter able to fulfill their mission of providing 
access to scholars and Ihe public. For every librarian who has 
received a requeslfor a book thai is oul-ol-prmt (as opposed to 
oul-ol-stock), Ihis in iliative will provide a mechanism lo meet 
the library palron's needs. 

In Ihe corning year, the Boslon Public Library, the Marine 
Biological Laboratory at Woods Hole. Umversidad Francisco 
Ma rroqu In of Guatemala, and the Int&rnel Archive are 
pioneering a digital mlerhbrary loan service around out-of-print 
books 




Interne! Bookmobile in Uganda 
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Print On Demand 



Cos! effective Print on Demand (POD) is now available 
through cenlrahzed services and dislnbuted services. 
Interne! Archive, witti Sloan Foundalonlundmg, has made 
the scanned books PDD-ready by creating high resolution 
black and white PDFs lhat are optimized lor cost-effective 
printing on existing laser printers. 

Binding machines are now available lor small installations 
lor approximately 3000 Irom Powis Parker. Large, 
aulomated book production machines are now available 
Irom On Demand Books. 

Interne! Archive is using on eollhese machines to creale a 
prololype library called the Open Library where patrons can 
browse, read, print, and leave with printed books 
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Scan On Demand 



Interne! Archive, in partnership wilh GCA contributors, will 
develop in 2GOS a Scan on Demand system lhat allows 
libraries and library patrons to request digilal copies ol any 
eligible book. Bookdigitizalon requests will be processed by 
existing regional scanning centers, and downloadable, 
searchable liles will be made available within a lew days. 

Records lor books thai are available lo be scanned, as well 
as books lhat have already been scanned, will be made 
available to the Open Library proiecl, as well as lo 
parlicipalmg libraries Libraries will be able to update their 
online public access calalogs (OPACs) to contain links to 
already digilzed resources, and to include a link to have 
books scanned on demand. This request will be routed to an 
Interne! Archive scanning cenler where the book will be 
scanned and made available to Ihe user, and to Ihe general 
public il appropriate. II Ihe syslem becomes as popular as we 
hope it will, we will also integrate cosl recovery mechanisms 
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$100 Laptop Project 

to "bring digital "books to 
millions of cMLdren 



An ambitious prefect to build an appropriate laptop lor 
children all over (he world will be offering public domain 

books to their users. 

The Cnief Technology Officer, Mary Lou Jepsen.. said "since 
the 100 Dollar Laptop is designed to be a *\ electronic book 
reade' as well as a general purpose computer, we are very 
happy to offer (he international book collection o( the Open 
Content Alliance to our users." 
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demo.open3ibrary.org 



One Web Page for Every Book 
GpenLibrary. org 

To bring our digilal holdings lo a broad population, people Irorn 
around the world are building a site called OpenLibrarv.org. To do 
this, Ihe site must be flexible and open It already incorporates the 
catalog Irorn Ihe Library ol Congress, all the digital books held by 
the Interne! Archive, and records of new books Irorn publishers. 

This is a bold experiment m openness. Like Wikipedia, the dala is 
editable and bulk downloadable. We see ihis as crucial lo build a 
solid framework lhat can grow and evolve as well as inviting olher 
non -commercial efforts to leverage the collection. 

This means mat others can make custom interlaces to Ihe same 
dala, say lor different language speakers This also means lhat 
some calegones ol books can have different interlaces 1 telephone 
directories can be treated differenlly Irorn novels and differenlly 
Irom iravel guides. 

This excilmg newproiect will rnalure over this year based on help 
Irom lechmcal staffs Irorn libraries and individual conlnbutors. 



Generous support from the California State Library and Ihe Kahle/ 
Austin Foundalion 
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